Skip to content

[Coral-Trino] Cast char fields, if necessary, to varchar type in the set operation - RelToTrinoConverter#438

Closed
findinpath wants to merge 1 commit intolinkedin:masterfrom
findinpath:findinpath/cast-fields-in-union
Closed

[Coral-Trino] Cast char fields, if necessary, to varchar type in the set operation - RelToTrinoConverter#438
findinpath wants to merge 1 commit intolinkedin:masterfrom
findinpath:findinpath/cast-fields-in-union

Conversation

@findinpath
Copy link
Copy Markdown
Contributor

What changes are proposed in this pull request, and why are they necessary?

In case of dealing with Hive views which make use of set operation (e.g. UNION) ensure that the fields from the inner SELECT statements have the same type as the field types of the set operation.

This patch adds the method com.linkedin.coral.trino.rel2trino.RelToTrinoConverter#setOpToSql which is used by all set operations UNION, INTERSECT, MINUS.

The proactive casting makes sure that Trino doesn't encounter anymore coercion issues while translating Hive views containing set statements.

Relates to trinodb/trino#18337

How was this patch tested?

This patch has unit test coverage.

@findinpath findinpath force-pushed the findinpath/cast-fields-in-union branch 3 times, most recently from 97b61be to 6eec47e Compare July 19, 2023 23:13
@aastha25
Copy link
Copy Markdown
Contributor

@findinpath can you implement this transformation as a SqlCallTransformer in HiveSqlNodeToCoralSqlNodeConverter? For example: a UnionSqlCallTransformer where the relDataType of the union-ed branches are compared and if there's a mismatch, CAST is introduced.

@wmoustafa
Copy link
Copy Markdown
Contributor

@findinpath can you implement this transformation as a SqlCallTransformer in HiveSqlNodeToCoralSqlNodeConverter? For example: a UnionSqlCallTransformer where the relDataType of the union-ed branches are compared and if there's a mismatch, CAST is introduced.

+1 since we are migrating all RelNode language specific transformations from the SqlNode layer, either on the Hive side or Trino side.

@wmoustafa
Copy link
Copy Markdown
Contributor

wmoustafa commented Jul 25, 2023

Just discussed with @aastha25 after testing the queries natively on Trino. It sounds the SQL text is correct, but the UNION operator type derivation in Calcite returns the type of the left operand as the type of the row. I think the correct solution is to extend the UNION operator and override the type derivation so that it does not always use the left operand's type but instead consider upcast cases like this. Coral implements a similar approach in HiveUncollect operator.

@findinpath findinpath force-pushed the findinpath/cast-fields-in-union branch 2 times, most recently from fc9a8f1 to 782340e Compare July 31, 2023 16:11
@findinpath
Copy link
Copy Markdown
Contributor Author

@aastha25 / @wmoustafa AC . Could you please review again the PR?

@findinpath findinpath requested review from aastha25 and wmoustafa July 31, 2023 16:12
@findinpath findinpath force-pushed the findinpath/cast-fields-in-union branch from 782340e to ea1397b Compare July 31, 2023 16:15
@findinpath findinpath changed the title [Coral-Trino] Cast fields, if necessary, to the output type of the set operation [Coral-Trino] Cast char fields, if necessary, to varchar type in the set operation Aug 1, 2023
@findinpath findinpath changed the title [Coral-Trino] Cast char fields, if necessary, to varchar type in the set operation [Coral-Trino] Cast char fields, if necessary, to varchar type in the set operation - RelToTrinoConverter Aug 11, 2023
@findinpath
Copy link
Copy Markdown
Contributor Author

@aastha25 , @wmoustafa can we please proceed with the review for this PR?

…set operation

In case of dealing with Hive views which make use of set operation (e.g. UNION)
ensure that the `char` fields from the inner SELECT statements have the same type
as the output field types of the set operation.

Due to wrong coercion between `varchar` and `char` in Trino, as described in
trinodb/trino#9031
, a work-around needs to be applied in case of translating Hive views which
contain a UNION dealing with char and varchar types.
The work-around consists in the explicit cast of the field
having char type towards varchar type corresponding of the set operation
output type.
@findinpath findinpath force-pushed the findinpath/cast-fields-in-union branch from ea1397b to a504411 Compare August 23, 2023 05:23
@wmoustafa
Copy link
Copy Markdown
Contributor

wmoustafa commented Sep 9, 2023

@findinpath Thank you for filing #442. I still believe #442 is the way to go since:

  • The issue is very specific to Trino so handling on the output side of the transformation is typically the expectation.
  • Specifically since this issue is sort of an unexpected behavior in Trino, updating Coral IR to accommodate it can cause significant backward incompatibilities and in an unexpected manner.

@aastha25 mentioned she would be able to help you make more progress on #442 especially that she has seen the error before. So it could be a minor issue after all.

@findinpath
Copy link
Copy Markdown
Contributor Author

The issue is very specific to Trino

Yes, this is true. This is why the change I'm suggesting is done in coral-trino library.
The change in RelToTrinoConverter is adding the exceptional handling for handling UNION statements on which a resultset containing a char column is to mixed with a resultset containing a varchar column to cope with the trinodb/trino#9031 limitation from Trino.

@aastha25 mentioned she would be able to help you make more progress on #442

Sure, if there a possibility to work with SqlCallTransformer I don't have anything against following this path.
I've stumbled on doing the translation of SELECT * FROM tablea UNION SELECT char_col FROM tableb. I'm not sure how to go forward on this path.
The current PR (see the attached tests) can successfully handle such situations.

Also another point in going forward with #438 is the amount of code to be changed/maintained to address trinodb/trino#9031 use-cases is quite straight-forward. For the SqlCallTransformer implementation we need much more code to be introduced in the library.

@wmoustafa
Copy link
Copy Markdown
Contributor

@findinpath, the place where you made the changes in this PR (438) is moving to coral-common soon to convege the "Coral IR" in one place. That is why we cannot make Trino-specific changes there.

The type derivation of * and expansion of * on the SqlNode layer is not supported now. I think it is possible to do, but we can get to it later. My suggestion is to proceed with #442 and get back to * soon as a follow up.

@findinpath
Copy link
Copy Markdown
Contributor Author

RelToTrinoConverter is moving to coral-common soon to converge the "Coral IR" in one place.

Closing the current PR, because it doesn't have chances to land on master .
Continuing work with #442

@findinpath findinpath closed this Sep 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants